Search CORE

813 research outputs found

A genetic algorithm for interpretable model extraction from decision tree ensembles

Author: A Assche Van
DK Slonim
H Kargupta
JH Holland
JR Quinlan
L Breiman
L Breiman
RC Barros
TG Dietterich
W-Y Loh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity

Crossref

Ghent University Academic Bibliography

An extensive experimental evaluation of automated machine learning methods for recommending classification algorithms

Author: A. A. Freitas
A. C. P. L. F. de Carvalho
A. G. C. de Sá
AA Freitas
AA Freitas
AE Eiben
C Wan
F Mohr
F Pedregosa
F Wilcoxon
G. L. Pappa
GL Pappa
GL Pappa
GL Pappa
IH Witten
J Demšar
J Vanschoren
K Deb
L Kotthoff
L Li
M de Souto
M Hall
M. P. Basgalupp
MJ Zaki
N Japkowicz
P Brazdil
R Daly
R Iman
R Leite
R Mckay
R. C. Barros
R. G. Mantovani
RC Barros
RC Barros
RC Barros
RC Barros
T Elsken
T Ho
T Křen
TK Ho
XE Guo
XE Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2020
Field of study

This paper presents an experimental comparison among four automated machine learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on evolutionary algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the combined algorithm selection and hyper-parameter optimisation (CASH) approach. The EA-based methods build classification algorithms from a single machine learning paradigm: either decision-tree induction, rule induction, or Bayesian network classification. Auto-WEKA combines algorithm selection and hyper-parameter optimisation to recommend classification algorithms from multiple paradigms. We performed controlled experiments where these four AutoML methods were given the same runtime limit for different values of this limit. In general, the difference in predictive accuracy of the three best AutoML methods was not statistically significant. However, the EA evolving decision-tree induction algorithms has the advantage of producing algorithms that generate interpretable classification models and that are more scalable to large datasets, by comparison with many algorithms from other learning paradigms that can be recommended by Auto-WEKA. We also observed that Auto-WEKA has shown meta-overfitting, a form of overfitting at the meta-learning level, rather than at the base-learning level

Crossref

Kent Academic Repository

Necessidades de saúde na atenção primária: percepção de profissionais que atuam na educação permanente

Author: Barros DG
Breilh J
Campos CMS
Campos CMS
Campos CMS
Campos O
Cecilio LCO
Cecílio LCO
Chiesa AM
Egry EY
Grosseman S
Hino P
Laurell AC
Matsumoto NF
Merhy EE
Merhy EE
Morais GSN
Santos RC
Souza ECF
Stotz EN
Waidman MAP
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2010
Field of study

Crossref

Associação entre dissinergia miocárdica e arritmia ventricular na forma indeterminada da doença de Chagas

Author: Almeida-Filho OC
Antonio Luiz Ribeiro
Barreto AC
Barros MV
Barros MV
Barros MV
Bestetti RB
Manoel Otávio da Costa Rocha
Maria do Carmo Nunes
Márcio Lins Barros
Ortiz J
Pazin-Filho A
Pedrosa RC
Rassi Jr A
Ribeiro ALP
Schmunis GA
Sternick EB
Viotti RJ
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Socioeconomic position and overweight among adolescents: data from birth cohort studies in Brazil and the UK

Author: AE Field
AJ Barros
AJ Stunkard
Alicia Matijasevich
Ana Maria Menezes
C Power
CA Monteiro
Cesar G Victora
CG Victora
CG Victora
Cora L Araujo
D Blane
EM Arredondo
Fernando C Barros
George Davey Smith
J Golding
J Sobal
J Sobal
Jean Golding
JM Lee
KK Davison
L McLaren
LS Adair
M Okasha
PH Whincup
POA Monteiro
PR Nader
RC Whitaker
RG McMurray
SA Nappo
SS Guo
TJ Parsons
WHO Multicentre Growth Reference Study Group
Y Wang
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Explore Bristol Research

Higher prevalence of major depressive symptoms in Brazilians aged 14 and older

Crossref

Measures and models for causal inference in cross-sectional studies: arguments for the appropriateness of the prevalence odds ratio and related logistic regression

Author: AJ Barros
AM Lilienfeld
B MacMahon
CA Santos
DG Kleinbaum
Evandro SF Coutinho
IC Marschner
IC Marschner
J Freeman
J Lee
J Lee
J Lee
KJ Rothman
KJ Rothman
KJ Rothman
L Rodrigues
Michael E Reichenheim
ML Thompson
MM Glymour
MR Petersen
N Keiding
N Pearce
O Axelson
O Axelson
O Miettinen
RC Brunet
RC Brunet
RJ Little
RLA Little
S Greenland
S Greenland
S Greenland
S Wacholder
T Behrens
T Skov
U Strömberg
U Strömberg
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Several papers have discussed which effect measures are appropriate to capture the contrast between exposure groups in cross-sectional studies, and which related multivariate models are suitable. Although some have favored the Prevalence Ratio over the Prevalence Odds Ratio -- thus suggesting the use of log-binomial or robust Poisson instead of the logistic regression models -- this debate is still far from settled and requires close scrutiny. Discussion In order to evaluate how accurately true causal parameters such as Incidence Density Ratio (IDR) or the Cumulative Incidence Ratio (CIR) are effectively estimated, this paper presents a series of scenarios in which a researcher happens to find a preset ratio of prevalences in a given cross-sectional study. Results show that, provided essential and non-waivable conditions for causal inference are met, the CIR is most often inestimable whether through the Prevalence Ratio or the Prevalence Odds Ratio, and that the latter is the measure that consistently yields an appropriate measure of the Incidence Density Ratio. Summary Multivariate regression models should be avoided when assumptions for causal inference from cross-sectional data do not hold. Nevertheless, if these assumptions are met, it is the logistic regression model that is best suited for this task as it provides a suitable estimate of the Incidence Density Ratio.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Overview of physical therapy graduation courses in Brazil: current scenario

Author: Almeida SM
Barreyro GB
Barros FBM
Belisário AS
Bispo Jr JP
Catani AM
Chaves VLJ
Corbucci PR
Cunha LA
Cunha LA
Cunha LA
Dourado LF
Dourado LF
Figueiredo ESA
Haddad AE
Macedo AR
Marques AP
Marães VRFS
Pereira LA
Raymundo CS
Seriano KN
Teixeira RC
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

DNA-Based Diet Analysis for Any Predator

Author: RA King
WOC Symondson
LE Blankenship
GL Harper
A Juen
BE Deagle
RM Casper
SN Jarman
BE Deagle
SN Jarman
RM Casper
SK Sheppard
SJ Iverson
BE Deagle
MH Kohn
DL Martin
SN Jarman
MA Kvitrud
BE Deagle
RM Sutherland
BE Deagle
SJ Green
H Vestheim
F von Wintzingerode
RH Zaidi
D Steinke
M Vences
SN Jarman
TA Hall
RC Edgar
RS Wells
C Simon
JA Huber
P Sunnucks
J Sambrook
SF Altschul
S Kumar
JM Gonzalez
G Dunshea
D Bensasson
B Venkatesh
KM Parsons
NB Barros
NB Barros
BE Deagle
SN Jarman
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Background: Prey DNA from diet samples can be used as a dietary marker; yet current methods for prey detection require a priori diet knowledge and/or are designed ad hoc, limiting their scope. I present a general approach to detect diverse prey in the feces or gut contents of predators. Methodology/Principal Findings: In the example outlined, I take advantage of the restriction site for the endonuclease Pac I which is present in 16S mtDNA of most Odontoceti mammals, but absent from most other relevant non-mammalian chordates and invertebrates. Thus in DNA extracted from feces of these mammalian predators Pac I will cleave and exclude predator DNA from a small region targeted by novel universal primers, while most prey DNA remain intact allowing prey selective PCR. The method was optimized using scat samples from captive bottlenose dolphins (Tursiops truncatus) fed a diet of 6–10 prey species from three phlya. Up to five prey from two phyla were detected in a single scat and all but one minor prey item (2% of the overall diet) were detected across all samples. The same method was applied to scat samples from free-ranging bottlenose dolphins; up to seven prey taxa were detected in a single scat and 13 prey taxa from eight teleost families were identified in total. Conclusions/Significance: Data and further examples are provided to facilitate rapid transfer of this approach to any predator. This methodology should prove useful to zoologists using DNA-based diet techniques in a wide variety of study systems

Public Library of Science (PLOS)

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

PubMed Central

University of Tasmania Open Access Repository

Dissertations of the University of Groningen

Automated machine learning for studying the trade-off between predictive accuracy and interpretability

Author: A Holzinger
A Verikas
AA Freitas
AGC Sá de
C Zhang
D Heckerman
F Mohr
I Epifanio
M Fernandez-Delgado
N Japkowicz
RC Barros
ZH Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2019
Field of study

Automated Machine Learning (Auto-ML) methods search for the best classification algorithm and its best hyper-parameter settings for each input dataset. Auto-ML methods normally maximize only predictive accuracy, ignoring the classification model’s interpretability – an important criterion in many applications. Hence, we propose a novel approach, based on Auto-ML, to investigate the trade-off between the predictive accuracy and the interpretability of classification-model representations. The experiments used the Auto-WEKA tool to investigate this trade-off. We distinguish between white box (interpretable) model representations and two other types of model representations: black box (non-interpretable) and grey box (partly interpretable). We consider as white box the models based on the following 6 interpretable knowledge representations: decision trees, If-Then classification rules, decision tables, Bayesian network classifiers, nearest neighbours and logistic regression. The experiments used 16 datasets and two runtime limits per Auto-WEKA run: 5 h and 20 h. Overall, the best white box model was more accurate than the best non-white box model in 4 of the 16 datasets in the 5-hour runs, and in 7 of the 16 datasets in the 20-hour runs. However, the predictive accuracy differences between the best white box and best non-white box models were often very small. If we accept a predictive accuracy loss of 1% in order to benefit from the interpretability of a white box model representation, we would prefer the best white box model in 8 of the 16 datasets in the 5-hour runs, and in 10 of the 16 datasets in the 20-hour runs

Crossref

Kent Academic Repository